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Background.  Evaluating  residency  programs  re¬ 
quires  objective  assessment  tools,  but  few  are  readily 
available.  The  purpose  of  this  study  was  to  measure 
education  by  correlating  resident  test  scores  with  sev¬ 
eral  measurements  of  educator  performance. 

Materials  and  methods.  The  study  group  included 
residents  and  educators  from  a  single  residency  pro¬ 
gram.  We  performed  a  retrospective  analysis  of  scores 
from  the  Orthopaedic  In-Training  Examination  col¬ 
lected  during  a  6-year  period.  Resident  examination 
scores  were  indexed  by  dividing  program  averages  by 
national  averages  to  determine  yearly  score  trends 
and  then  were  correlated  with  educator  attendance 
and  teaching  hours.  Subspecialty  scores  were  ranked 
to  gauge  residency  strengths  and  weaknesses.  Teach¬ 
ing  hours  devoted  to  subspecialties  were  compared 
with  test  scores  to  measure  curricular  emphases  and 
to  appraise  teaching  efficiency. 

Results.  Yearly  average  examination  scores  were 
proportional  to  national  averages  (P  <  0.001).  How¬ 
ever,  of  3436  possible  educator-score  associations, 
only  15  scores  correlated  highly  (r  >  0.9)  with  educa¬ 
tors,  and  only  26  were  significant  (P  <  0.05).  Trend 
analysis  put  subspecialty  scores  in  yearly  perspective. 
Ranking  was  inaccurate  until  scores  were  indexed  to 
the  national  average.  In  2002,  the  distribution  of  238 
teaching  hours  ranged  from  4  to  48  h  for  subspecial- 
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ties,  and  9  of  12  subspecialties  were  emphasized  dis¬ 
proportionately  to  the  examination.  Teaching  effi¬ 
ciency  varied  more  than  10-fold  by  subspecialty. 

Conclusions.  The  creation  of  a  score  index  helped  to 
identify  and  address  imbalances  between  teaching 
hours  devoted  to  subspecialties  and  resident  needs  as 
evidenced  by  low  In-Training  examination  scores. 
The  present  study  improved  educator  accountabil¬ 
ity  by  correlating  measurements  of  teaching  and 
learning. 

Key  Words:  graduate  medical  education  measure¬ 
ment;  residency;  surgery;  assessment;  curriculum;  ac¬ 
ademic  medicine. 


INTRODUCTION 

Residency  education  affects  good  clinical  practice, 
and  regular  assessment  helps  check  satisfactory  resi¬ 
dency  education  [1-4].  Educators  seek  useful  assess¬ 
ment  tools  [4,  5],  but  few  specific  assessment  methods 
are  readily  available  [4,  6].  For  example,  tools  offered 
by  the  Accreditation  Council  of  Graduate  Medical  Ed¬ 
ucation  focus  on  individual  resident  assessment  [7], 
but  we  found  these  tools  problematic  for  residency  pro¬ 
gram  assessment.  For  example,  the  assessment  of  in¬ 
dividuals  helped  educators  appraise  individuals  but 
did  not  help  educators  manage  program-as-a-whole  is¬ 
sues,  like  the  quantities  of  curriculum  by  subspecialty. 
The  need  to  assess  residency  education  with  objective 
criteria  was  plain  yet  unfulfilled  [3-5]  so  tools  were 
devised  from  available  data  to  quantify  education.  We 
derived  assessment  tools  from  the  Orthopaedic  In- 
Training  Examination  (OITE),  an  evaluation  of  overall 
learning  within  12  subspecialties  used  for  medical 
knowledge  assessment  since  1963  [8-11].  The  written 
examination,  in  12  clinical  topics,  averages  265  total 
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questions,  and  lasts  up  to  5  h  as  taken  by  each  resident 
[12].  The  OITE,  sponsored  by  the  American  Academy  of 
Orthopaedic  Surgeons,  is  a  valid  indicator  of  medical 
knowledge  [13].  Scorers  from  the  American  Academy  of 
Orthopaedic  Surgeons  report  test  results  as  number  of 
subspecialty  questions,  and  the  program  and  national 
averages  of  correct  answers  by  subspecialty  [14]. 

The  aim  of  the  present  study  was  to  measure  teach¬ 
ing  and  learning  to  identify  areas  in  need  of  redress. 
We  measured  the  success  of  our  residency  by  correlat¬ 
ing  educator  attendance  with  resident  OITE  scores, 
measured  learning  by  yearly  scoring  trends,  ranked 
subspecialty  scores,  counted  teaching  hours,  compared 
teaching  hours  and  program  scores,  and  calculated 
efficiency  of  teaching. 

MATERIALS  AND  METHODS 
Study  Design 

We  designed  a  retrospective  study  to  analyze  data  gathered  from 
1997  to  2002  to  test  the  hypothesis  that  there  was  a  difference  in 
OITE  scores  as  a  function  of  educator  attendance.  The  OITE  is 
divided  by  subspecialties  such  as  Spine,  Hand,  Hip  and  Knee  Recon¬ 
struction,  etc.,  or  by  general  categories,  such  as  Orthopaedic  Dis¬ 
eases,  including  musculoskeletal  tumors.  We  use  the  term  subspe¬ 
cialty  generally  to  refer  to  all  test  categories  in  the  present  study. 
The  table  contains  definitions  of  study  terms.  The  study  group  aver¬ 
aged  10  staff  surgeons  and  20  residents  from  a  single  residency. 

Program  Scores,  National  Scores,  and  the  Score  Index 

We  averaged  resident  scores  from  the  OITE  to  get  annual  program 
average  by  subspecialty.  Program  averages  were  dependent  vari¬ 
ables.  We  plotted  the  program  average  (number  of  correct  answers 
divided  by  subspecialty  questions)  and  used  linear  regression  to 
determine  the  relationship  between  program  and  national  scores 
(program  average  =  0.85  X  national  average  +0.11,  P  <  0.001,  r  = 
0.855,  r2  =  0.73).  To  account  for  variability  in  yearly  scores  due  to  the 
difficulty  of  subspecialty  topics,  variations  in  the  number  of  ques¬ 
tions  between  tests,  and  to  permit  comparisons  among  subspecial¬ 
ties,  we  created  a  score  index.  The  score  index  for  each  subspecialty 
was  obtained  by  dividing  the  program  average  by  the  national  aver¬ 
age  and  constituted  another  dependent  variable  to  evaluate  program 
and  educator  effectiveness  by  subspecialty. 

Educator  Associations  with  Resident  Examination  Scores 

Educator  attendance,  the  presence  or  absence  of  individual  sur¬ 
geons  within  the  program,  was  an  independent  variable.  A  surgeon 
had  to  be  in  the  program  for  four  or  more  months  prior  to  the 
examination  to  qualify  as  in  attendance.  For  example,  if  a  surgeon 
was  deployed  to  war  for  more  than  8  months,  then  he  was  “absent”  as 
an  educator  regarding  that  year’s  examination.  Surgeon  attendance 
(presence  or  absence)  was  transformed  to  a  scale  variable  (1  or  0, 
respectively).  Surgeon  attendance  was  correlated  with  subspecialty 
program  averages  and  score  index.  We  used  correlations  to  screen 
the  data  to  determine  which  relationships  needed  further  analysis  by 
score  trends  over  time. 

Examination  Score  Trends 

We  charted  subspecialty  score  trends  over  time  to  compare  edu¬ 
cator  attendance  with  the  score  index  and  national  and  program 
scores.  Differences  of  less  than  10%  between  program  and  national 


TABLE  1 

Definitions  of  Educational  Terms 


Term 

Definition 

Program 

For  the  residency,  the  number  of  correct 

average 

OITE  answers  divided  by  the  total  number 
of  questions  per  subspecialty,  expressed  as 
a  percent. 

National 

Nationwide,  the  number  of  correct  OITE 

average 

answers  divided  by  the  total  number  of 
OITE  questions  per  subspecialty,  expressed 
as  a  percent. 

Score  index 

Program  average  divided  by  the  national 
average. 

Teaching  hours 

Residency  teaching  hours  per  subspecialty 
divided  by  total  teaching  hours  for  all 
subspecialties,  expressed  as  a  percent. 

Emphasis 

Teaching  hours  divided  by  percentage  of 

OITE  questions  per  subspecialty. 

Efficiency 

Score  index  expressed  as  a  percent  divided  by 
percent  teaching  hours  per  subspecialty. 

averages  were  considered  small;  differences  of  10%  or  more  were 
considered  large.  If  correlation  and  graphical  analysis  showed  a 
relationship  between  surgeon  attendance  and  scores  within  the  sur¬ 
geon’s  subspecialty,  then  we  confirmed  the  relationship  with  analy¬ 
sis  of  variance. 

Subspecialty  Score  Rankings 

We  ranked  subspecialties  subjectively  and  objectively  by  the  score 
index  (program  average  divided  by  national  average)  to  see  how  well 
we  assessed  high  and  low  scores.  We  ranked  subspecialties  subjec¬ 
tively  by  surveying  12  residency  personnel  (7  residents  and  5  edu¬ 
cators)  in  2002.  The  survey  listed  the  12  subspecialties  and  asked 
respondents  to  rank  subspecialty  performance  for  the  program  on 
the  2002  test.  Ranked  1  to  12  from  highest  to  lowest,  subspecialty 
average  ranks  were  compared  to  ranked  score  index  (program  aver¬ 
age  divided  by  national  average)  for  2002.  Results  were  displayed 
graphically. 

Teaching  Hours,  Emphasis,  and  Efficiency 

We  counted  educational  hours  in  1-h  blocks  from  the  2002  curric¬ 
ulum  schedule  to  quantify  subspecialty  teaching  effort  devoted  to 
events.  Events  included  lectures,  conferences,  workshops,  grand 
rounds,  and  symposia.  We  included  only  events  that  ten  or  more 
residents  attended  as  a  group.  Resident  availability  averaged  11 
residents  during  the  study  period  because  of  external  rotations.  At 
least  one  attending  participated  in  all  educational  events  except  for 
some  resident  group  study  sessions.  Forty-eight  hours  of  group  study 
were  not  subspecialty-specific  and  were  not  counted.  Teaching  hours 
were  considered  as  percentages  (number  of  subspecialty  hours  di¬ 
vided  by  total  teaching  hours;  Table  1). 

We  wanted  to  see  how  teaching  hours  were  balanced  with  the 
examination,  and  so  we  calculated  emphasis  (percent  teaching  hours 
divided  by  the  percentage  of  subspecialty  questions).  The  benchmark 
emphasis  was  1,  and  we  display  the  results  graphically.  High  em¬ 
phasis  was  above  1.5,  and  low  emphasis  was  below  0.5.  To  assess  a 
trait  of  teaching  hours  we  calculated  efficiency  (score  index  ex¬ 
pressed  as  a  percent  divided  by  percent  teaching  hours)  and  dis¬ 
played  the  results  graphically. 

To  determine  the  relationship  between  teaching  hours  or  effort 
devoted  to  subspecialties  and  the  OITE  emphasis  on  subspecialties, 
we  plotted  subspecialty  teaching  hours  as  a  function  of  the  percent- 
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FIG.  1.  OITE,  program  versus  national  averages.  The  linear 
regression  line  is  displayed.  Twelve  subspecialties  during  the  course 
of  6  years  created  a  total  of  72  data  points.  Program  averages  were 
directly  proportional  to  national  averages  (P  <  0.001). 


age  of  number  of  subspecialty  questions  from  2002  and  performed 
linear  regression  (teaching  hours  =  0.109  X  questions  +0.074,  P  = 
0.783,  r  =  0.089).  We  analyzed  similarly  the  subspecialty  program 
scores  and  teaching  hours  (program  score  =  0.075  X  teaching  hours 
+  0.058,  P  =  0.708,  r  =  0.326),  and  score  index  and  teaching  hours 
(score  index  =  0.404  X  teaching  hours  +1.051,  P  =  0.299,  r  =  0.121) 
when  score  index  and  teaching  hours  were  expressed  as  percentages. 

Statistical  Analysis 

We  used  Pearson’s  product-moment  correlation  coefficient  to  com¬ 
pare  resident  scores  to  surgeon  attendance.  We  counted  high  (r  > 
0.9)  and  significant  (P  <  0.05)  correlations  and  then  performed 
two-tailed  t  tests.  Significant  associations  (P  <  0.05)  were  positive  or 
negative.  A  positive  association  was  an  increased  subspecialty  score 
with  surgeon  attendance  while  a  negative  association  was  a  de¬ 
creased  score.  We  used  Microsoft  Office  Professional  97  for  data 
management  (Microsoft  Inc.,  Redmond,  WA)  and  SPSS,  version  11.5 
(SPSS  Inc.,  Chicago,  IL)  for  statistical  analysis. 


RESULTS 

Program  scores  were  directly  proportional  to  na¬ 
tional  scores  (Fig.  1).  The  result  was  significant  (P  < 
0.001),  indicating  that  the  difficulty  of  individual  sub¬ 
specialty  testing  itself  was  a  major  factor  in  determin¬ 
ing  scores.  Points  greater  than  2  standard  errors  from 
the  line  were  10%  different  than  the  expected  value; 
this  finding  indicated  a  score  index  (program  average 
divided  by  national  average)  may  be  a  better  assess¬ 
ment  tool.  Score  index  results  adjust  for  differences  in 
difficulty  of  subspecialty  testing  and  provided  addi¬ 
tional  perspective  to  program  scores.  For  an  example 
from  Fig.  1,  the  lowest  (far  left;  49%  program  average, 
39%  national  average,  1.26  score  index)  and  highest 
(far  right;  95%,  94%,  1.01)  data  points  were  without  a 
clear  relationship  for  program  and  national  averages, 
but  the  score  index  made  the  results  more  clear.  The  r2 
value  of  0.73  indicated  that  73%  of  the  variability  can 
be  explained  by  national  average;  in  other  words  the 
difficulty  of  the  test  was  a  dominant  factor  in  deter¬ 
mining  mean  resident  scores  in  our  program. 


Educator  Associations  With  Resident  Scores 

Calculations  for  24  attending  surgeons  and  12 
subspecialties  during  the  course  of  6  years  for  2 
score  measures — program  averages  and  score  index 
results — created  3456  associations.  The  program  aver¬ 
age  had  three  high  (r  >  0.9)  and  eight  significant  (P  < 
0.05)  correlations  with  surgeons.  The  score  index  had 
12  high  and  18  significant  correlations  with  surgeons. 
Most  correlations  appeared  coincidental  on  graphical 
analysis,  but  one  program  average  correlation  for 
Sports  Medicine  (r  =  -0.960)  with  surgeon  Z,  was 
significant  (P  =  0.002)  and  the  effect  was  large  (Fig. 
2A).  Surgeon  Z  was  not  a  sports  medicine  surgeon  and 
was  present  only  in  1999. 

Score  Trends 

We  displayed  subspecialty  score  trends  by  year  (Fig. 
2A  and  B).  In  2002  for  Orthopaedic  Sciences,  the  pro¬ 
gram  average  was  22%  above  the  national  average 
(Fig.  2B).  Surgeons  B  and  U  were  available  in  2002 
only,  whereas  surgeons  I  and  M  were  available  in  all 
years  except  2002.  These  associations  were  significant 
for  the  score  index  (P  =  0.002)  but  not  for  program 
score  (P  >  0.05). 

Subspecialty  Score  Rankings 

The  list  of  2002  subjective  subspecialty  rankings  dif¬ 
fered  from  the  objective  ranking  with  five  subspecial¬ 
ties  differing  by  four  to  eight  positions.  Highest  ranked 
score  index  results  were  in  Orthopaedic  Science,  Or¬ 
thopaedic  Diseases,  Hand,  and  Rehabilitation;  lowest 
ranked  score  index  results  were  in  Sports  Medicine, 
Pediatric  Orthopaedics,  Hip  and  Knee  Reconstruction, 
and  Spine  in  descending  order  (Fig.  3).  The  worst  three 
score  index  results  on  both  2002  and  6-year  rankings 
were  the  same  (i.e.,  Pediatric  Orthopaedics,  Spine,  and 
Hip  and  Knee  Reconstruction),  but  only  Pediatric  Or¬ 
thopaedics  was  subjectively  ranked  in  the  bottom  three 
positions.  Subjective  ranks  for  five  subspecialties  were 
misleading  in  that  they  were  different  than  objective 
ranks.  Radial  graphing  of  objective  rankings  indicated 
relative  scores  precisely  for  every  subspecialty  (Fig.  3). 
The  general  improvement  from  the  6-year  average  to 
the  2002  scores  may  be  from  increased  educator  inter¬ 
est  in  academics. 

Analysis  of  Teaching  Hours,  Emphasis,  and  Efficiency 

For  teaching  hours,  the  distribution  of  238  educa¬ 
tional  hours  ranged  from  1.7%  (4/238)  for  Orthopaedic 
Diseases  to  20.2%  (48/238)  for  Orthopaedic  Science 
(Fig.  4A).  The  most  teaching  hours  were  devoted  to 
Orthopaedic  Science,  Trauma,  and  Hand;  the  least 
hours  to  Rehabilitation,  Sports  Medicine,  and  Ortho¬ 
paedic  Diseases  in  descending  order.  Orthopaedic  Sci¬ 
ence  was  the  only  topic  systematically  delegated  to  all 
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FIG.  2.  (A)  Sports  Medicine,  average  scores  and  index.  Program  average  (black  circles),  national  average  (white  circles),  and  Sports 
Medicine  score  index  (squares).  Score  index  corroborates  the  greatest  difference  between  program  and  national  scores  for  1999.  (B) 
Orthopaedic  Science,  average  scores  and  index.  Program  average  (black  circles),  national  average  (white  circles),  and  Orthopaedic  Science 
score  index  (squares).  Score  index  reveals  similar  program  and  national  averages  until  a  sharp  divergence  for  2002. 


educators,  and  many  teaching  hours  were  expected 
after  the  curriculum  had  been  redesigned  at  the  start 
of  the  study  period. 

Comparing  scores  by  emphasis  (score  index  divided 
by  teaching  hours),  Orthopaedic  Science,  Shoulder  and 
Elbow,  Rehabilitation,  Foot  and  Ankle,  Hand  and 
Spine  had  high  emphasis  (Fig.  4B),  and  three  of  these 
six  subspecialties  also  had  high  score  index  results  for 
2002  (Fig.  3).  Pediatric  Orthopaedics  had  both  a  low 
emphasis  and  score  index.  Three  of  our  4  best  score 
index  results  had  high  emphasis  whereas  2  of  our  4 
lowest  score  index  results  had  low  emphasis.  Ortho¬ 
paedic  Diseases  and  Pediatric  Orthopaedics  were  ex¬ 
ternal  rotations  with  expected  low  emphasis.  Ortho¬ 
paedic  Diseases  had  the  second  highest  score 
index  with  the  lowest  emphasis  so  its  teaching  had 
high  efficiency  (score  index  divided  by  teaching  hours, 
Fig.  4C). 

Spine 


Shoulder 

FIG.  3.  Subspecialty  objective  rankings  by  score  index.  2002 
(white  circles)  and  6-year  (black  squares).  The  score  index  bench¬ 
mark  of  1  equals  the  national  average.  2002  score  index  results  are 
generally  better  than  the  6-year  average. 


Rankings  seemed  associated  with  teaching  hours  for 
the  subspecialties  at  the  highest  and  lowest  teaching 
hours,  but  the  association  between  teaching  hours  and 
the  number  of  subspecialty  questions  was  not  signifi¬ 
cant  ( P  =  0.78,  Fig.  5A).  Teaching  hours  did  not  reflect 
the  number  of  subspecialty  examination  questions. 
Furthermore,  the  relationships  between  resident 
scores  and  teaching  hours,  (P  =  0.71,  Fig.  5B)  and 
between  score  index  and  teaching  hours  were  not  sig¬ 
nificant  (P  =  0.30,  Fig.  5C). 

DISCUSSION 

The  present  educational  study  demonstrated  several 
methods  for  assessing  graduate  medical  education. 
Measuring  residency  education  with  specific  tools  en¬ 
compassed  both  teaching  and  learning,  and  offered  a 
way  to  tally  academic  work.  Measuring  teaching  hours 
and  teaching  efficiency  quantified  academic  contribu¬ 
tion.  We  were  able  to  assess  our  curriculum  by  com¬ 
paring  subspecialty  emphases  to  a  national  standard. 
The  analysis  detected  mismatches  of  what  was  taught 
(i.e.,  emphasized)  and  what  was  tested.  Curriculum 
content  adjustments  may  now  be  made  based  on 
counted  teaching  hours  and  scores  indexed  to  the  na¬ 
tion.  Ranking  subspecialties  by  index  results  permit¬ 
ted  clearer  identification  of  weak  and  strong  subspe¬ 
cialties.  Teacher  interests  and  resident  needs  (low 
scores)  became  obvious  upon  objective  ranking.  Before 
this  study,  surgeon  interests  were  not  guided  by  resi¬ 
dent  needs  in  part  because  we  lacked  the  proper  tools 
to  compare  student  needs  (low  scores)  and  teacher  in¬ 
terests.  Measuring  resident  score  index  results  permit¬ 
ted  statistical  testing  of  educational  hypotheses.  Be¬ 
cause  of  duty  hour  restrictions  for  residents,  educators 
cannot  assess  curricular  content  alone  but  need  to  as¬ 
sess  balances  of  curricular  content,  i.e.,  what  is  called 
emphasis  in  the  present  study.  In  a  full  curriculum 
with  a  time  restriction,  changes  must  be  made  cau- 
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FIG.  4.  (A)  Percentage  of  total  teaching  hours,  2002.  (B)  teaching  emphasis  by  subspecialty,  2002.  An  emphasis  of  1.0  ±  0.5  meant  that 
the  subspecialty  teaching  hours  were  balanced  with  the  examination.  Nine  of  12  subspecialty  emphases  were  unbalanced  with  the 
examination.  (C)  Teaching  efficiency  by  subspecialty  in  2002.  High  efficiency  (score  index  divided  by  teaching  hours)  was  in  subspecialties 
Orthopedic  Diseases,  Sports  Medicine,  and  Rehabilitation,  whereas  low  efficiency  was  in  Hand,  Spine,  Trauma,  and  Orthopedic  Science  in 
descending  order.  Efficiency  ranged  from  6  for  Orthopedic  Science  to  69  for  Orthopedic  Diseases. 


tiously  as  additions  means  other  content  must  go,  mak¬ 
ing  choices  a  zero-sum  gain.  As  we  found  no  statistical 
relationship  between  the  quantity  of  teaching  and  res¬ 
ident  scores,  a  central  issue  may  a  better  balance  of 
subspecialty  teaching  quantities  and  not  be  more 
teaching  per  se.  The  quality  of  teaching  may  be  more 
important  than  the  quantity. 

The  findings  and  methods  of  the  present  study  are 
novel  contributions  to  medical  educational  research. 
The  study  introduces  ways  to  test  statistically  educa¬ 
tional  premises  in  residency,  and  reports  a  quantita¬ 
tive  analysis  of  teaching  hours,  emphasis,  and  effi¬ 
ciency.  With  new,  specific,  and  objective  assessment 
tools  for  educational  effectiveness,  the  present  study 
used  available  data  in  original  ways  to  identify  areas 
for  redress  or  further  study.  The  residency  needed  a 
more  flexible  tool  than  program  average  to  compare  to 
national  data,  to  rank  subspecialty  strengths  and 
weaknesses,  and  to  compare  yearly  trends.  The  score 
index  had  more  statistical  associations  than  program 
average,  and  was  more  useful  as  an  assessment  tool. 
The  score  index  assessed  residency  education  thor¬ 
oughly  and  permitted  a  detailed  analysis.  Trend  anal¬ 
ysis  displayed  scoring  changes  that  were  undetected 
prior  to  score  index  use.  Subspecialty  ranking  capac¬ 


ity  was  poor  until  the  score  index  was  used  as  the 
score  index  corrected  misconceptions  on  subspecialty 
strengths  and  weaknesses. 

An  essential  ingredient  to  teacher  effectiveness  is 
not  subspecialty  training  but  educator  interest  (both 
self-declared  and  manifested  by  behavior)  in  teaching 
the  evidence  of  which  we  found  in  various  forms.  Vol¬ 
unteering  extra  lectures,  surgeon  participation  at  con¬ 
ferences  and  in  resident  assessment,  and  passing  on 
key  summaries  of  current  knowledge  from  subspecialty 
meetings  illustrated  interest  in  teaching  relevant  to 
the  curriculum.  Surgeons  taught  subjects  of  their  own 
interest;  for  example,  two  surgeons  were  hand  sur¬ 
geons  and  gave  many  hand  teaching  hours,  as  hand 
surgery  was  of  interest  to  hand  surgeons,  resulting  in  a 
high  score  index.  However,  surgeon  disinterest  in 
teaching  was  also  evident;  surgeons  disregarded  resi¬ 
dent  needs  evidenced  by  low  scores  and  taught  little 
outside  areas  of  surgeon  interest.  Disinterest  also  was 
evidenced  by  not  volunteering  to  fill  schedule  voids, 
poor  attendance  at  educational  events,  late  arrival, 
and  early  departure  from  events,  rescheduling  events 
because  of  inadequate  preparation,  and  subject  switch¬ 
ing  regardless  of  resident  need.  Resident  needs  dem¬ 
onstrated  by  low  subspecialty  scores  did  not  trump 
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FIG.  5.  (A)  Relationship  of  percent  teaching  hours  to  percent  examination  questions,  2002.  The  regression  line  is  displayed.  No  significant 

association  was  found  between  the  quantities  of  teaching  and  testing.  (B)  Percent  program  score  as  a  function  of  percent  teaching  hours,  2002. 
The  regression  line  is  displayed.  No  significant  association  was  found  between  the  quantity  of  teaching  and  resident  scores.  (C)  Score  index 
as  a  function  of  percent  teaching  hours,  2002.  The  regression  line  is  displayed.  No  significant  association  was  found  between  the  quantities 
of  teaching  and  resident  scores. 


educator  interest  so  we  addressed  assertively  tension 
between  surgeon  interests  and  resident  needs  by  rein¬ 
forcing  schedules,  rewarding  compliance,  and  hosting 
consultants  to  address  subspecialties  in  which  cover¬ 
age  was  weak.  Educator  behavior  and  not  educator 
training  per  se  was  a  central  issue  affecting  resident 
scores. 

The  strengths  of  the  present  study  are  several.  This 
evidence-based  research  applied  scholarly  processes  to 
education.  An  original  design  permitted  a  comprehen¬ 
sive  analysis  of  education  with  intriguing  results  of 
interest  to  medical  educators.  Measuring  surgeon  ca¬ 
pacity  to  teach  and  contribute  academically  indicated 
both  academic  confidence  and  accountability.  The 
present  work  showed  residents  the  value  of  being  ef¬ 
fective  educators  and  researchers.  The  educational 
process  and  research  caused  the  hospital  administra¬ 
tion,  the  department,  the  medical  educators,  the  Resi¬ 
dency  Review  Committee  evaluator,  the  medical  stu¬ 
dents,  and  the  house  staff  to  take  note.  Quantifying 
educational  teaching  hours  allowed  comparison  of  the 
curriculum  balance  to  an  external  standard  and 
yielded  the  first  steps  toward  an  outcomes-based  cur¬ 
riculum.  Creative  methods  solved  needs. 

The  weaknesses  of  the  present  study  are  several. 
The  examination  has  limits  in  assessing  education  be¬ 
cause  it  is  mainly  a  knowledge  assessment  tool  and 
does  not  measure  competence  well  [9,  15].  The  exami¬ 


nation  may  not  be  the  best  arbiter  of  learning  nor  does 
it  cover  all  relevant  parameters  of  clinical  practice  [9] . 
The  examination  is  an  imperfect  tool  despite  its  spec¬ 
ificity  for  our  residency,  its  objectivity,  and  its  avail¬ 
ability.  Our  study  is  norm-referenced  to  national  data 
and  not  criterion-referenced  to  behavior  standards 
[16].  The  present  study  looks  more  at  the  curriculum 
and  less  at  the  nature  of  instruction;  for  example,  the 
present  work  did  not  account  for  small-group  teaching 
or  self-study.  The  preponderance  of  teaching  is  outside 
of  the  scheduled  program  curriculum,  the  focus  of  the 
present  study  [17].  This  focus  limits  the  scope  of  the 
study  conclusions  to  curriculum  analysis  and  consti¬ 
tutes  a  study  limitation. 

The  present  study  looked  more  at  residency  vari¬ 
ables  than  individual  person  variables  although  both 
are  important.  Residency  issues  (e.g.,  teaching  hours 
and  emphases)  affected  scores  as  did  the  attendance  of 
some  educators.  The  residency  executed  a  major  rede¬ 
sign  of  the  Orthopaedic  Science  curriculum  in  2002 
with  increased  teaching  hours.  Trend  analysis  helped 
discern  that  the  high  number  of  Orthopaedic  Science 
teaching  hours  seemed  more  important  than  individ¬ 
ual  educator  attendance  (Fig.  2B).  Further,  surgeon  Z, 
who  was  not  a  sports  medicine  surgeon,  and  a  Sports 
Medicine  surgeon  were  involved  in  a  leadership  strug¬ 
gle  in  1999,  and  the  low  scores  in  Sports  Medicine  were 
obvious  (Fig.  2A).  Surgeon  Z  was  the  only  surgeon 
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present  only  in  1999,  and  no  surgeon  was  only  absent 
in  1999.  Therefore,  the  high  correlation  and  negative 
association  indicate  an  important  correlation.  Obvi¬ 
ously,  no  educator  wants  to  have  a  negative  effect,  and 
the  residency  variables  such  as  interpersonal  issues 
may  be  more  important  than  individual  person  vari¬ 
ables  such  as  whether  an  educator  is  present  or  absent. 
Education,  a  complex  topic,  requires  study  of  many 
important  variables  beyond  the  scope  of  the  present 
work.  The  general  application  and  validity  of  novel 
tools  and  methods  will  need  further  testing  by  other 
orthopaedic  residencies  and  specialties,  and  studies 
are  under  way.  Although  intriguing,  application  of  the 
findings  of  the  present  study  should  be  used  with  cau¬ 
tion  until  such  findings  are  validated. 

In  conclusion,  the  present  study  measured  education 
by  accounting  for  teaching  effectiveness,  analyzing  res¬ 
ident  scoring  trends,  ranking  subspecialty  scores,  and 
accounting  for  teaching  hours  and  emphases.  The  find¬ 
ings  of  the  present  study  contribute  novel  analyses  and 
findings  to  the  growing  body  of  knowledge  regarding 
evidence-based  education.  The  educational  tools  in  the 
present  study  helped  identify  objectively  specific  areas 
for  redress. 
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